Skip to content

tests: Add the cargo-fuzz fuzzing suite#36982

Draft
def- wants to merge 2 commits into
MaterializeInc:mainfrom
def-:fuzz-suite
Draft

tests: Add the cargo-fuzz fuzzing suite#36982
def- wants to merge 2 commits into
MaterializeInc:mainfrom
def-:fuzz-suite

Conversation

@def-

@def- def- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Adds the cargo-fuzz suite: fuzz targets, seed corpora, dictionaries, and the runner/CI integration (release-qualification, 24h). Covers the SQL parser/pretty-printer, repr (strconv, jsonb, Row codec/proto, arithmetic oracles), the expr optimizer transforms, the Avro/Protobuf/CSV/pgwire/pgcopy decoders, pgrepr/pgtz, the upsert state machine, persist durable-state decode, and proto round-trips across storage-types/persist/catalog/external table descs.

Also includes the production-side enablement the targets require (the fuzzing/fuzz Cargo features and #[doc(hidden)]/cfg-gated re-exports) and the macOS build fixes those exposures necessitated.

This is the infrastructure PR — mostly mechanical (generated corpora/dicts). The individual bugs it surfaced are split into separate per-subsystem PRs.

Depends on:

def- added a commit that referenced this pull request Jun 15, 2026
Hardens the `mz-avro` decoder against adversarial input (Avro
bytes/schemas arrive from Kafka and an external registry, so a panic/OOM
is an availability bug): bound per-block array/map lengths and object
counts by remaining input, cap object-container block byte length, bound
schema-parse/value-decode recursion, and fix two schema-resolution
panics on unmatched named types.

Found by the cargo-fuzz suite ([separate infra
PR](#36982)). Each fix
has a regression test.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
def- added a commit that referenced this pull request Jun 16, 2026
Print→reparse round-trip bugs in the SQL parser and pretty-printer,
surfaced by the grammar-aware fuzz target. Each fix has a regression
test; the sqllogictest/testdrive plan goldens are refreshed to match.

Themes: quoting bare keyword identifiers (any/all/some/list,
context-sensitive keywords), parenthesizing low-precedence operands
(prefix ops, casts, COLLATE, quantified comparisons), special-form
display correctness (EXTRACT/POSITION/SUBSCRIBE), and bounding parser
recursion/backtracking to reject pathological inputs.

Found by the cargo-fuzz suite ([separate infra
PR](#36982)).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@def- def- force-pushed the fuzz-suite branch 2 times, most recently from 7f0a2e2 to 4380619 Compare June 18, 2026 10:19
def- added a commit that referenced this pull request Jun 22, 2026
…37177)

`RowPacker::push_array_with_unchecked` and `push_array_with_row_major`
compute an array's expected cardinality as the product of its dimension
lengths and compare it against the actual number of elements pushed. The
product was an unchecked `usize` multiply (`dims.iter()...product()` /
`cardinality *= dim.length`), so dimension lengths whose product exceeds
`usize::MAX` overflowed.

Under overflow checks (debug, and the cargo-fuzz build) this panics; in
release it silently wraps, and a wrapped value can even spuriously match
the actual element count — accepting a corrupt array (e.g. dims claiming
`[2^32, 2^32]` wrap to a cardinality of 0, matching an empty element
list). This is reachable from `Row::decode` over an attacker- or
corruption-supplied `ProtoRow`, since the proto array dimensions are not
otherwise bounded.

Saturate the product to `usize::MAX` instead. An overflowing cardinality
is impossibly large — no array can hold that many elements — so it never
equals the real element count and the existing check rejects it as
`WrongCardinality`, turning the panic/silent-wrap into a clean error on
both build profiles.

Found by the `repr::row_codec_roundtrip` cargo-fuzz target in
#36982

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
def- added a commit that referenced this pull request Jun 22, 2026
All found via #36982

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
def- added a commit that referenced this pull request Jun 23, 2026
Hardens `mz-repr` proto/Row decoding against malformed/untrusted bytes —
these paths are reachable from persisted state and the wire, so a panic
is an availability bug. Replaces panics/asserts with proper decode
errors and validates ranges (Date, CheckedTimestamp, ProtoNumeric,
ProtoRange, ProtoRelationDesc, ProtoRow dict ordering, uuid parsing,
leap-second truncation, Avro decimal).

Found by the cargo-fuzz suite ([separate infra
PR](#36982)). Each fix
has a regression test.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Moritz Hoffmann <antiguru@gmail.com>
def- added a commit that referenced this pull request Jun 25, 2026
…nput (#36985)

Hardens decode paths in `persist`, `persist-client`, `persist-types`,
`pgrepr`, and `postgres-util` against malformed/untrusted input: reject
rollup-less/empty-frontier/hollow-only/invalid StateDiff state instead
of panicking or debug-asserting, guard UUID parsing, fix i16 overflow in
pgrepr numeric-scale binary decode, and propagate u16-conversion errors
in the postgres table-desc protos.

Found by the cargo-fuzz suite ([separate infra
PR](#36982)). Each fix
has a regression test.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@def- def- force-pushed the fuzz-suite branch 3 times, most recently from 36e969a to a53a715 Compare June 25, 2026 20:05
Squashes every fuzzing-infrastructure change in this branch into one commit,
separate from the individual bug fixes the fuzzing surfaced (one commit each):

* The cargo-fuzz crates under `src/*/fuzz` — targets, seed corpora,
  dictionaries, and `prepare-corpus.sh` scripts — covering the SQL parser /
  pretty-printer, repr (strconv, jsonb, Row codec/proto, arithmetic oracles),
  the expr optimizer transforms, Avro/Protobuf/CSV/pgwire/pgcopy decoders,
  pgrepr/pgtz, the upsert state machine, persist durable-state decode, and the
  proto round-trips across storage-types/persist/catalog/external table descs.
* The harness and runner wiring: `--profile fruitful`, `--jobs auto`,
  per-crate sharding, artifact-based crash detection, `.repro.txt` sidecars,
  a time-capped post-fuzz corpus minimize/upload, and the auto-generated
  `buf.yaml` fuzz-crate excludes.
* CI: move cargo-fuzz from nightly to release qualification (24h, 48-core).
* The production-side enablement the targets require: the `fuzzing` Cargo
  feature and the `#[doc(hidden)]` / `cfg`-gated re-exports that expose upsert,
  persist-client, and pgwire internals to the fuzz crates.
* The macOS build fix those exports necessitated: switching the affected
  storage `Stream::inspect` calls to `InspectCore::inspect_container`, which
  avoids the objc2-driven trait-solver overflow the `Inspect` bound triggers
  on macOS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes: CLU-137

Fixes two user-visible bugs in how `reduce` treats an erroring operand of
a non-strict AND/OR:

* A materialized view or query whose `mz_now()` temporal filter sits under
  an OR of ANDs, e.g.

      WHERE (a AND ... AND mz_now() < t) OR (b AND ... AND mz_now() < t)

  could panic the compute worker (or fail to plan) with "Unsupported
  temporal predicate". The shared `mz_now() < t` conjunct was left buried
  inside the OR instead of being factored out, so temporal-filter
  extraction failed.

* A query that should short-circuit, such as `WHERE col AND (1/0 = 1)`,
  could spuriously fail at runtime (e.g. "division by zero") even on rows
  where `col` is false. AND/OR are non-strict: `false AND <error>` is
  `false` and `true OR <error>` is `true`, so such rows must be filtered,
  not errored.

Mechanism: the generic variadic fold in `reduce` replaced a call with any
operand's literal error unconditionally, which is wrong for a function that
is not strict in errors. Fold only for the strict variadics, excluding
AND/OR and ErrorIfNull (which can absorb an operand's error at runtime).
Keeping the erroring operand then reaches `undistribute_and_or`, which
recombines operands across AND/OR's short-circuit boundary. That only
preserves error semantics for operands common to every disjunct, so skip
undistribution otherwise. A shared temporal predicate (`mz_now() < t`,
whose cast can error) is common to every disjunct, so it is still factored
out and stays extractable, unlike the reverted MaterializeInc#37049's blunt
`could_error()` skip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant